Synopsis

1. Problem Statement: The objective of this project is to analyse covid-19 daily cases in relation to age, gender, medical status and the method by which it was transmitted.

2. Solution Overview: Following cleaning and formatting procedures, the data was separated into a number of sections for exploratory analysis:

  • The first section contains a summary of the overall cases and deaths counts, as well as an analysis of factors such as the growth rate, growth factor and trailing growth factor.

*The second section investigates the correlation between age, gender and the daily case counts.

  • The next section looks at the influence of medical status on daily case counts with regards to hospitalisation and ICU admittance data.

  • The final section explored the proportion of daily case counts that were related to healthcare workers, as well as the primary methods by which the virus was transmitted.

A variety of packages were used for this exercise, each analysis and piece of related code is included and explained.

3. Insights:

This dataset provided a detailed insight into the spread of COVID 19 within the Irish population. It has allowed for some interesting comparisons to be calculated and for the general trends of cases since the 16th March to be mapped out. Our work has looked into various aspects of this data, with specific interest into the pattern of cases, hospitalisations, clusters and the age of people affected by the virus.

Packages Required

The Following Packages were used:
  • Tidyverse: Used to load core packages
  • GGplot2: Used to plot charts
  • GGally: Used to plot charts
  • Dplyr: Used for data manipulation
  • Gridextra: Used to arrange plots in a grid format
  • Plotly: Used to plot interactive charts
  • Hrbrthemes: Used for additional themes
  • Extrafont: Used for additional fonts
  • Cowplot: Used to create publication quality figures
  • Ggpubr: Used to plot charts
  • Zoo: Used for irregular and regular time series data
  • DT: Used to display summary data
  • FBasics: Used to plot charts
  • Highcharter: Used to plot interactive charts
  • KableExtra: Used to generate tables
  • Ggiraph: Used to plot charts
  • Knitr: Used to display an aligned table on the screen
  • magick: An image processing package
  • gganimate: Used to animate graphs
  • gifski: Converts image frames to gifs
suppressMessages(library(tidyverse))
suppressMessages(library(ggplot2))
suppressMessages(library(GGally))
suppressMessages(library(dplyr))
suppressMessages(library(gridExtra))
suppressMessages(library(plotly))
suppressMessages(library(hrbrthemes))
suppressMessages(library(extrafont))
suppressMessages(library(cowplot))
suppressMessages(library(ggpubr))
suppressMessages(library(zoo))
suppressMessages(library(DT))
suppressMessages(library(fBasics))
suppressMessages(library(highcharter))
suppressMessages(library(kableExtra))
suppressMessages(library(ggiraph))
suppressMessages(library(gganimate))
suppressMessages(library(gifski))
suppressMessages(library(magick))
suppressMessages(library(knitr))
suppressMessages(library(viridis))

Data Preparation

This section contains all the steps and procedures that were followed in order to prepare the data for analysis.

Data Import

d<- read.csv('http://opendata-geohive.hub.arcgis.com/datasets/d8eb52d56273413b84b0187a4e9117be_0.csv')
d <- na.omit(d)
covid_data <- d
covid_data <- covid_data %>% select(-(ï..X:Y),-FID)

Data Cleaning

Changing Date Format

The first part of the data cleaning process involved converting the two date columns from a character format to a date format. To do this we first removed all punctuation from the strings, then we removed the zeros from the end of the strings that were related to the timestamp.

covid_data$Date <- gsub("[[:punct:]]", "",covid_data$Date)
covid_data$Date <- gsub('\\s00000000','',covid_data$Date)
covid_data$Date <- as.Date(covid_data$Date,"%Y%m%d")
covid_data$StatisticsProfileDate <- gsub("[[:punct:]]", "",covid_data$StatisticsProfileDate)
covid_data$StatisticsProfileDate <- gsub('\\s00000000','',covid_data$StatisticsProfileDate)
covid_data$StatisticsProfileDate <- as.Date(covid_data$StatisticsProfileDate,"%Y%m%d")
covid_data <-covid_data %>% 
  rename(
    Date1 = StatisticsProfileDate,
  )

Calculating daily cases

Next we calculated the number of new daily cases for each column as the original data was given cumulative format. To do this we subtracted the number of cases recorded for each day by the number of cases recorded on the previous day. In a number of instances, the cumulative number of cases decreased from the previous day which in turn yielded negative new case values. This is due to an error in the original data set. To combat this issue, we assigned all negative new cases values to zero.

covid_data <- covid_data %>%
  rename(
   Total_Male = Male 
  )
covid_data <- covid_data %>%
  rename(
   Total_Female = Female 
  )
covid_data <- covid_data %>%
  rename(
   Total_Unknown = Unknown
  )
covid_data$Daily_Cases <- ave(covid_data$CovidCasesConfirmed, FUN=function(x) c(0,diff(x)))
covid_data$Male <- ave(covid_data$Total_Male, FUN=function(x) c(0,diff(x)))
covid_data$Female <- ave(covid_data$Total_Female, FUN=function(x) c(0,diff(x)))
covid_data$Unknown <- ave(covid_data$Total_Unknown, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged1 <- ave(covid_data$Aged1, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged1to4 <- ave(covid_data$Aged1to4, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged5to14 <- ave(covid_data$Aged5to14, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged15to24 <- ave(covid_data$Aged15to24, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged25to34 <- ave(covid_data$Aged25to34, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged35to44 <- ave(covid_data$Aged35to44, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged45to54 <- ave(covid_data$Aged45to54, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged55to64 <- ave(covid_data$Aged55to64, FUN=function(x) c(0,diff(x)))
covid_data$New_Aged65up <- ave(covid_data$Aged65up, FUN=function(x) c(0,diff(x)))
covid_data$Daily_Hospital_cases <- ave(covid_data$HospitalisedCovidCases, FUN=function(x) c(0,diff(x)))
covid_data$ICU_cases <- ave(covid_data$RequiringICUCovidCases, FUN= function(x) c(0, diff(x)))
covid_data$Hosp_Age5 <- ave(covid_data$HospitalisedAged5, FUN= function(x) c(0,diff(x)))
covid_data$Hosp_Age5to14 <- ave(covid_data$HospitalisedAged5to14, FUN= function(x) c(0,diff(x)))
covid_data$Hosp_Age15to24 <- ave(covid_data$HospitalisedAged15to24, FUN= function(x) c(0,diff(x)))
covid_data$Hosp_Age25to34 <- ave(covid_data$HospitalisedAged25to34, FUN= function(x) c(0,diff(x)))
covid_data$Hosp_Age35to44 <- ave(covid_data$HospitalisedAged35to44, FUN= function(x) c(0,diff(x)))
covid_data$Hosp_Age45to54 <- ave(covid_data$HospitalisedAged45to54, FUN= function(x) c(0,diff(x)))
covid_data$Hosp_Age55to64 <- ave(covid_data$HospitalisedAged55to64, FUN= function(x) c(0,diff(x)))
covid_data$Hosp_Age65up <- ave(covid_data$HospitalisedAged65up, FUN= function(x) c(0,diff(x)))
covid_data$HealthcareWorkersCovidCases<-ave(covid_data$HealthcareWorkersCovidCases, FUN=function(x) c(0,diff(x)))

covid_data$Hosp_Age5 <- ifelse(covid_data$Hosp_Age5 < 0, 0, covid_data$Hosp_Age5)
covid_data$Hosp_Age5to14 <- ifelse(covid_data$Hosp_Age5to14 < 0, 0, covid_data$Hosp_Age5to14)
covid_data$Hosp_Age15to24 <- ifelse(covid_data$Hosp_Age15to24 < 0, 0, covid_data$Hosp_Age15to24)
covid_data$Hosp_Age25to34 <- ifelse(covid_data$Hosp_Age25to34 < 0, 0, covid_data$Hosp_Age25to34)
covid_data$Hosp_Age35to44 <- ifelse(covid_data$Hosp_Age35to4 < 0, 0, covid_data$Hosp_Age35to4)
covid_data$Hosp_Age45to54 <- ifelse(covid_data$Hosp_Age45to54 < 0, 0, covid_data$Hosp_Age45to54)
covid_data$Hosp_Age55to64 <- ifelse(covid_data$Hosp_Age55to64 < 0, 0, covid_data$Hosp_Age55to64)
covid_data$Hosp_Age65up <- ifelse(covid_data$Hosp_Age65up < 0, 0, covid_data$Hosp_Age65up)
covid_data$ICU_cases <- ifelse(covid_data$ICU_cases <0, 0, covid_data$ICU_cases)
covid_data$Daily_Hospital_cases <- ifelse(covid_data$Daily_Hospital_cases<0, 0, covid_data$Daily_Hospital_cases)
covid_data$Daily_Cases<- ifelse(covid_data$Daily_Cases < 0, 0,covid_data$Daily_Cases)
covid_data$Female <- ifelse(covid_data$Female < 0, 0, covid_data$Female)
covid_data$Male <- ifelse(covid_data$Male < 0, 0, covid_data$Male)
covid_data$Unknown <- ifelse(covid_data$Unknown < 0, 0, covid_data$Unknown)
covid_data$New_Aged1 <- ifelse(covid_data$New_Aged1 < 0, 0, covid_data$New_Aged1)
covid_data$New_Aged1to4 <- ifelse(covid_data$New_Aged1to4 < 0, 0, covid_data$New_Aged1to4)
covid_data$New_Aged5to14 <- ifelse(covid_data$New_Aged5to14 < 0, 0, covid_data$New_Aged5to14)
covid_data$New_Aged15to24 <- ifelse(covid_data$New_Aged15to24 < 0, 0, covid_data$New_Aged15to24)
covid_data$New_Aged25to34 <- ifelse(covid_data$New_Aged25to34 < 0, 0, covid_data$New_Aged25to34)
covid_data$New_Aged35to44 <- ifelse(covid_data$New_Aged35to44 < 0, 0, covid_data$New_Aged35to44)
covid_data$New_Aged45to54 <- ifelse(covid_data$New_Aged45to54 < 0, 0, covid_data$New_Aged45to54)
covid_data$New_Aged55to64 <- ifelse(covid_data$New_Aged55to64 < 0, 0, covid_data$New_Aged55to64)
covid_data$New_Aged65up <- ifelse(covid_data$New_Aged65up < 0, 0, covid_data$New_Aged65up)
covid_data$HealthcareWorkersCovidCases<- ifelse(covid_data$HealthcareWorkersCovidCases < 0, 0,covid_data$HealthcareWorkersCovidCases)

Assigning First Entries

Due to the method by which we calculated the new daily cases, the first entry of each column was assigned to zero. The below code reassigns the first entry of the new columns to the first entry of the corresponding column in the original data.

covid_data$Hosp_Age5[1] <- covid_data$Hosp_Age5[1]
covid_data$Hosp_Age5to14[1] <- covid_data$Hosp_Age5to14[1]
covid_data$Hosp_Age15to24[1] <- covid_data$Hosp_Age15to24[1]
covid_data$Hosp_Age25to34[1] <- covid_data$Hosp_Age25to34[1]
covid_data$Hosp_Age35to44[1] <- covid_data$Hosp_Age35to44[1]
covid_data$Hosp_Age45to54[1] <- covid_data$Hosp_Age45to54[1]
covid_data$Hosp_Age55to64[1] <- covid_data$Hosp_Age55to64[1]
covid_data$Hosp_Age65up[1] <- covid_data$Hosp_Age65up[1]
covid_data$ICU_cases[1] <- covid_data$ICU_cases[1]
covid_data$Daily_Hospital_cases[1] <- covid_data$Daily_Hospital_cases
covid_data$Daily_cases[1] <- covid_data$Daily_cases[1]
covid_data$Female[1] <- covid_data$Total_Female[1]
covid_data$Male[1] <- covid_data$Total_Male[1]
covid_data$Unknown[1] <- covid_data$Total_Unknown[1]
covid_data$New_Aged1[1] <- covid_data$Aged1to4[1]
covid_data$New_Aged1to4[1] <- covid_data$Aged1to4[1]
covid_data$New_Aged5to14[1] <- covid_data$Aged5to14[1]
covid_data$New_Aged15to24[1] <- covid_data$Aged15to24[1]
covid_data$New_Aged25to34[1] <- covid_data$Aged25to34[1]
covid_data$New_Aged35to44[1] <- covid_data$Aged35to44[1]
covid_data$New_Aged45to54[1] <- covid_data$Aged45to54[1]
covid_data$New_Aged55to64[1] <- covid_data$Aged55to64[1]
covid_data$New_Aged65up[1] <- covid_data$Aged65up[1]
covid_data$HealthcareWorkersCovidCases[1]<-covid_data$HealthcareWorkersCovidCases[1]

Data Preview

datatable(covid_data)

Data Description

The below table contains the variable names, the data type and a brief description of each.

Variable_name <- colnames(covid_data)
Variable_type <- lapply(covid_data,class)
Variable_desc <- c('Date value that is used with the CovidCasesConfirmed, TotalConfirmedCovidCases, ConfirmedCovidDeaths and TotalCovidDeaths columns', 'Number of daily covid cases','Cumulative number of  daily covid cases', 'Number of daily covid deaths', 'Cumulative number of daily covid deaths', 'Date value that is used with all remaining columns', 'Cumulative number of daily covid cases', 'Cumulative number of daily covid cases that required hospitalisation', 'Cumulative number of daily covid cases that were admitted to the intensive care unit (ICU)', 'Number of daily covid cases that were healthcare workers', 'Number of clusters notified',
'Cumulative number of daily covid cases that required hospitalisation and were aged 5', 'Cumulative number of daily covid cases that required hospitalisation and were aged 5 to 14', 'Cumulative number of daily covid cases that required hospitalisation and were aged 15 to 24', 'Cumulative number of daily covid cases that required hospitalisation and were aged 25 to 34', 'Cumulative number of daily covid cases that required hospitalisation and were aged 35 to 44', 'Cumulative number of daily covid cases that required hospitalisation and were aged 45 to 54', 'Cumulative number of daily covid cases that required hospitalisation and were aged 55 to 64', 'Cumulative number of daily covid cases that required hospitalisation and were aged 65+', 'Cumulative number of daily covid cases whose gender is male', 'Cumulative number of daily covid cases whose gender is female', 'Cumulative number of daily covid cases whose gender is unknown', 'Cumulative number of daily covid cases of those aged 1', 'Cumulative number of daily covid cases of those aged 1 to 4', 'Cumulative number of daily covid cases of those aged 5 to 14', 'Cumulative number of daily covid cases of those aged 15 to 24', 'Cumulative number of daily covid cases of those aged 25 to 34', 'Cumulative number of daily covid cases of those aged 35 to 44', 'Cumulative number of daily covid cases of those aged 45 to 54', 'Cumulative number of daily covid cases of those aged 55 to 64', 'Cumulative number of daily covid cases of those aged 65+', 'Median age of the new covid cases recorded on that day', 'Cumulative number of daily covid cases that were as a result of community tranmission', 'Cumulative number of daily covid cases that were as a result of close contact transmission', 'Cumulative number of daily covid cases that were as a result of travel abroad transmission', 'Number of daily covid cases', 'Number of daily covid cases whose gender is male', 'Number of daily covid cases whose gender is female', 'Number of daily covid cases whose gender is unknown', 'Number of daily covid cases of those aged 1','Number of daily covid cases of those aged 1 to 4', 'Number of daily covid cases of those aged 5 to 14', 'Number of daily covid cases of those aged 15 to 24', 'Number of daily covid cases of those aged 25 to 34', 'Number of daily covid cases of those aged 35 to 44', 'Number of daily covid cases of those aged 45 to 54', 'Number of daily covid cases of those aged 55 to 64', 'Number of daily covid cases of those aged 65+', 'Number of daily covid cases that required hospitalisation', 'Number of daily covid cases that were admitted to the intensive care unit (ICU)', 'Number daily covid cases that required hospitalisation and were aged 5', 'Number daily covid cases that required hospitalisation and were aged 5 to 14', 'Number daily covid cases that required hospitalisation and were aged 15 to 24', 'Number daily covid cases that required hospitalisation and were aged 25 to 34 ', 'Number daily covid cases that required hospitalisation and were aged 35 to 44', 'Number daily covid cases that required hospitalisation and were aged 45 to 54', 'Number daily covid cases that required hospitalisation and were aged 55 to 64', 'Number daily covid cases that required hospitalisation and were aged 65+')

data_desc <- data.frame(cbind(Variable_name,Variable_type,Variable_desc))
row.names(data_desc) <- NULL
colnames(data_desc) <- c('Variable Name', 'Variable Type', 'Variable Description')
kable(data_desc)
Variable Name Variable Type Variable Description
Date Date Date value that is used with the CovidCasesConfirmed, TotalConfirmedCovidCases, ConfirmedCovidDeaths and TotalCovidDeaths columns
ConfirmedCovidCases integer Number of daily covid cases
TotalConfirmedCovidCases integer Cumulative number of daily covid cases
ConfirmedCovidDeaths integer Number of daily covid deaths
TotalCovidDeaths integer Cumulative number of daily covid deaths
Date1 Date Date value that is used with all remaining columns
CovidCasesConfirmed integer Cumulative number of daily covid cases
HospitalisedCovidCases integer Cumulative number of daily covid cases that required hospitalisation
RequiringICUCovidCases integer Cumulative number of daily covid cases that were admitted to the intensive care unit (ICU)
HealthcareWorkersCovidCases numeric Number of daily covid cases that were healthcare workers
ClustersNotified integer Number of clusters notified
HospitalisedAged5 integer Cumulative number of daily covid cases that required hospitalisation and were aged 5
HospitalisedAged5to14 integer Cumulative number of daily covid cases that required hospitalisation and were aged 5 to 14
HospitalisedAged15to24 integer Cumulative number of daily covid cases that required hospitalisation and were aged 15 to 24
HospitalisedAged25to34 integer Cumulative number of daily covid cases that required hospitalisation and were aged 25 to 34
HospitalisedAged35to44 integer Cumulative number of daily covid cases that required hospitalisation and were aged 35 to 44
HospitalisedAged45to54 integer Cumulative number of daily covid cases that required hospitalisation and were aged 45 to 54
HospitalisedAged55to64 integer Cumulative number of daily covid cases that required hospitalisation and were aged 55 to 64
HospitalisedAged65up integer Cumulative number of daily covid cases that required hospitalisation and were aged 65+
Total_Male integer Cumulative number of daily covid cases whose gender is male
Total_Female integer Cumulative number of daily covid cases whose gender is female
Total_Unknown integer Cumulative number of daily covid cases whose gender is unknown
Aged1 integer Cumulative number of daily covid cases of those aged 1
Aged1to4 integer Cumulative number of daily covid cases of those aged 1 to 4
Aged5to14 integer Cumulative number of daily covid cases of those aged 5 to 14
Aged15to24 integer Cumulative number of daily covid cases of those aged 15 to 24
Aged25to34 integer Cumulative number of daily covid cases of those aged 25 to 34
Aged35to44 integer Cumulative number of daily covid cases of those aged 35 to 44
Aged45to54 integer Cumulative number of daily covid cases of those aged 45 to 54
Aged55to64 integer Cumulative number of daily covid cases of those aged 55 to 64
Aged65up integer Cumulative number of daily covid cases of those aged 65+
Median_Age integer Median age of the new covid cases recorded on that day
CommunityTransmission integer Cumulative number of daily covid cases that were as a result of community tranmission
CloseContact integer Cumulative number of daily covid cases that were as a result of close contact transmission
TravelAbroad integer Cumulative number of daily covid cases that were as a result of travel abroad transmission
Daily_Cases numeric Number of daily covid cases
Male numeric Number of daily covid cases whose gender is male
Female numeric Number of daily covid cases whose gender is female
Unknown numeric Number of daily covid cases whose gender is unknown
New_Aged1 numeric Number of daily covid cases of those aged 1
New_Aged1to4 numeric Number of daily covid cases of those aged 1 to 4
New_Aged5to14 numeric Number of daily covid cases of those aged 5 to 14
New_Aged15to24 numeric Number of daily covid cases of those aged 15 to 24
New_Aged25to34 numeric Number of daily covid cases of those aged 25 to 34
New_Aged35to44 numeric Number of daily covid cases of those aged 35 to 44
New_Aged45to54 numeric Number of daily covid cases of those aged 45 to 54
New_Aged55to64 numeric Number of daily covid cases of those aged 55 to 64
New_Aged65up numeric Number of daily covid cases of those aged 65+
Daily_Hospital_cases numeric Number of daily covid cases that required hospitalisation
ICU_cases numeric Number of daily covid cases that were admitted to the intensive care unit (ICU)
Hosp_Age5 numeric Number daily covid cases that required hospitalisation and were aged 5
Hosp_Age5to14 numeric Number daily covid cases that required hospitalisation and were aged 5 to 14
Hosp_Age15to24 numeric Number daily covid cases that required hospitalisation and were aged 15 to 24
Hosp_Age25to34 numeric Number daily covid cases that required hospitalisation and were aged 25 to 34
Hosp_Age35to44 numeric Number daily covid cases that required hospitalisation and were aged 35 to 44
Hosp_Age45to54 numeric Number daily covid cases that required hospitalisation and were aged 45 to 54
Hosp_Age55to64 numeric Number daily covid cases that required hospitalisation and were aged 55 to 64
Hosp_Age65up numeric Number daily covid cases that required hospitalisation and were aged 65+

Exploratory Data Analysis

Total Cases Summary - Dean

Growth Rate, Growth Factor & Trailing Growth Factor

Growth Rate %



The following graphs below illustrate the growth and spread of the virus in terms of the growth factor. The trailing growth factor is displayed for 3 days, 7 days and 14 days. The trailing growth factor is essentially a rolling average of covid-19 cases. It is often a better indicator of how prevalent the virus is spreading in the long-term.

covid_data <-mutate(covid_data, Lagged_TotalConfirm = dplyr::lag(covid_data$TotalConfirmedCovidCases,k=1),
 GrowthRate_percent = (covid_data$ConfirmedCovidCases / Lagged_TotalConfirm) * 100)
covid_data$GrowthRate_percent <- trunc(covid_data$GrowthRate_percent)

Growth Factor & Trailing Growth Factor plots

##Growth Factor
covid_data <-mutate(covid_data, Lagged_Confirm = dplyr::lag(covid_data$ConfirmedCovidCases),
 Growth_Factor = (covid_data$ConfirmedCovidCases / Lagged_Confirm))
covid_data$Growth_Factor <- round(covid_data$Growth_Factor,digits=2)

GR1 <-covid_data %>%
  ggplot( aes(x=Date, y=Growth_Factor)) +
    geom_step(fill="yellow",alpha=0.5) +geom_line(color="yellow") +
    ylab("Growth Factor") + 
    theme_modern_rc()
GR1

##Trailing Growth Factor (3 Days)
library(zoo)
covid_data <- mutate(covid_data, Trailing_Growth3 = rollmean(covid_data$Growth_Factor,3,fill=NA))

GR2 <-covid_data %>%
  ggplot( aes(x=Date, y=Trailing_Growth3)) +
    geom_step(fill="steelblue",alpha=0.5) +geom_line(color="steelblue") +
    ylab(" Trailing Growth Factor (3 days)") + 
    theme_modern_rc()
## Trailing Growth Factor (7 days)
covid_data <- mutate(covid_data, Trailing_Growth7 = rollmean(covid_data$Growth_Factor,7,fill=NA))
GR2

GR3 <-covid_data %>%
  ggplot( aes(x=Date, y=Trailing_Growth7)) +
    geom_step(fill="purple",alpha=0.5) +geom_line(color="purple") +
    ylab(" Trailing Growth Factor (7 days)") + 
    theme_modern_rc()
## Trailing Growth Factor (14 days)
covid_data <- mutate(covid_data, Trailing_Growth14 = rollmean(covid_data$Growth_Factor,14,fill=NA))
GR3

GR4 <-covid_data %>%
  ggplot( aes(x=Date, y=Trailing_Growth14)) +
    geom_step(fill="purple",alpha=0.5) +geom_line(color="green") +
    ylab(" Trailing Growth Factor (14 days)") + 
    theme_modern_rc()
GR4

Cases & Total Cases

Daily Cases

library(plotly)
library(hrbrthemes)
dp1 <- covid_data %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("Daily Covid Cases") +
    theme_ipsum()

dp1 <- ggplotly(dp1)
dp1

Total confirmed Cases Plot

dp2 <-covid_data %>%
  ggplot( aes(x=Date, y=TotalConfirmedCovidCases)) +
    geom_area(fill="red", alpha=0.5) +
    geom_line(color="red") +
    ylab("Total Covid Cases") +
    theme_minimal()
dp2<-ggplotly(dp2)
dp2

Deaths & Total Deaths

Covid Deaths

dp3 <-covid_data %>%
  ggplot( aes(x=Date, y=ConfirmedCovidDeaths)) +
    geom_area(fill="black", alpha=0.5) +
    geom_line(color="black") +
    ylab(" Covid Deaths") +
    theme_minimal()
dp3<-ggplotly(dp3)
dp3

Total Covid Deaths

dp4 <-covid_data %>%
  ggplot( aes(x=Date, y=TotalCovidDeaths)) +
    geom_area(fill="yellow", alpha=0.5) +
    geom_line(color="black") +
    ylab(" Total Covid Deaths") +
    theme_minimal()
dp4<-ggplotly(dp4)
dp4

Daily Cases by Month (March-July)



The graphs shown below are animated daily covid-19 cases from the month of March to July. These graphs provide an insight into the growth of the virus in a live format.

library(gifski)
library(dplyr)
library(gganimate)
covid_data$Month <- format(covid_data$Date, "%b")
covid_data$Day <- format(covid_data$Date, "%d")

covid_data$MonthDay <- format(covid_data$Date, "%d-%b")


##March

March <- dplyr::filter(covid_data,Month=='Mar')
marChart <- March %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="lightblue") +
    ylab("Daily Covid Cases") + xlab("MARCH") +
    theme_ipsum() + transition_reveal(Date) 
##April
April <-dplyr::filter(covid_data,Month=='Apr')
aprChart <- April %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="darkgrey") +
    ylab("Daily Covid Cases") + xlab("APRIL") +
    theme_ipsum() + transition_reveal(Date)
##May
May <-dplyr::filter(covid_data,Month=='May')
mayChart <- May %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="green") +
    ylab("Daily Covid Cases") + xlab("MAY") +
    theme_ipsum()  + transition_reveal(Date)
##June
June <-dplyr::filter(covid_data,Month=='Jun')
junChart <- June %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="purple") +
    ylab("Daily Covid Cases") + xlab("JUNE") +
    theme_ipsum() + transition_reveal(Date)
##July
July <-dplyr::filter(covid_data,Month=='Jul')
julChart <- July %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="red") +
    ylab("Daily Covid Cases") + xlab("JULY") +
    theme_ipsum() + transition_reveal(Date)
library(magick)
library(gifski)
mar_gif <- animate(marChart, width = 400, height = 400)
apr_gif <- animate(aprChart, width = 400, height = 400)
may_gif <- animate(mayChart, width = 400, height = 400)
jun_gif <- animate(junChart, width = 400, height = 400)
jul_gif <- animate(julChart, width = 400, height = 400)
mar_mgif <- image_read(mar_gif)
apr_mgif <- image_read(apr_gif)
may_mgif <- image_read(may_gif)
jun_mgif <- image_read(jun_gif)
jul_mgif <- image_read(jul_gif)
mar_mgif

apr_mgif

may_mgif

jun_mgif

jul_mgif

Daily Cases by Month (August-December)



The graphs shown below are animated daily covid-19 cases from the month of August to December.

##August
August <-dplyr::filter(covid_data,Month=='Aug')
augChart <- August %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="lightgreen") +
    ylab("Daily Covid Cases") + xlab("AUGUST") +
    theme_ipsum() + transition_reveal(Date)
##September
September <-dplyr::filter(covid_data,Month=='Sep')
sepChart <- September %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="orange") +
    ylab("Daily Covid Cases") + xlab("SEPTEMBER") +
    theme_ipsum() + transition_reveal(Date)
##October 
October <-dplyr::filter(covid_data,Month=='Oct')
octChart <- October %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="pink") +
    ylab("Daily Covid Cases") + xlab("OCTOBER") +
    theme_ipsum() + transition_reveal(Date)
##November
November <-dplyr::filter(covid_data,Month=='Nov')
novChart <- November %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="yellow") +
    ylab("Daily Covid Cases") + xlab("NOVEMBER") +
    theme_ipsum()  + transition_reveal(Date)
##December 
December <-dplyr::filter(covid_data,Month=='Dec')
decChart <- December %>%
  ggplot( aes(x=Date, y=ConfirmedCovidCases)) +
    geom_line(color="blue") +
    ylab("Daily Covid Cases") + xlab("DECEMBER") +
    theme_ipsum() + transition_reveal(Date)
aug_gif <- animate(augChart, width = 400, height = 400)
sep_gif <- animate(sepChart, width = 400, height = 400)
oct_gif <- animate(octChart, width = 400, height = 400)
nov_gif <- animate(novChart, width = 400, height = 400)
dec_gif <- animate(decChart, width = 400, height = 400)
aug_mgif <- image_read(aug_gif)
sep_mgif <- image_read(sep_gif)
oct_mgif <- image_read(oct_gif)
nov_mgif <- image_read(nov_gif)
dec_mgif <- image_read(dec_gif)
aug_mgif

sep_mgif

oct_mgif

nov_mgif

dec_mgif

New Cases by Age and Gender - Stephen

In this section of the analysis, we explored the variation in the number of new covid-19 cases according to age-group and gender. This was done using a combination of line plots, bar charts, heat maps, boxplots and some interactive plots where possible.

Summary of New Case Data

The below table contains summaries of each of the variables that will be used in this section. The first three columns are related to the gender, and the remaining columns are related to age-group

s <- basicStats(covid_data %>% select(Male:New_Aged65up))[c("Mean", "Stdev", "Median", "Minimum", "Maximum", "nobs"),]
datatable(s,width='100%')

Covid Cases According to Age

Daily Covid Cases According to Age

This section of the analysis deals specifically with the age variable. The first three plots are related to the number of new daily Covid-19 cases.

From the below plots, it can be observed that the distribution of new daily covid-19 cases is approximately bimodal, with peaks between the March to May and September to November regions. The 15-24 and 65+ age-groups have the highest numbers of new daily cases. The highest peak of new cases recorded across the ages was 427, which occurred on the 22nd of April in the Aged 65+ group. The Aged 65+ group made up the largest proportion of new cases recorded during the March to May period. Interestingly, the Aged 15-24 group made up the largest proportion of new cases recorded during the September to November period, which coincided with schools and colleges returning after the summer break.

It is important to note that some the spikes observed in the overall daily cases plot, such as the 102 cases which were recorded on the 4th of July, are due to errors in the original data set that could not be amended. The original new case data was given as the cumulative number of cases, where the figure for the cases on day 3 was obtained by adding the number of new cases on day 3 to the total cases from days 1 and 2. In a number of instances, this cumulative number of cases decreased from the previous day which in turn yielded negative new case values. To combat this issue, we assigned all negative new cases values to zero which occasionally resulted in spikes in the plots that are not entirely accurate. All errors of this nature will be identified in subsequent plots.

cases <- covid_data %>%
  select(Date1,New_Aged1,New_Aged1to4,New_Aged5to14,New_Aged15to24,New_Aged25to34,New_Aged35to44,New_Aged45to54,New_Aged55to64,New_Aged65up) %>%
  gather(key='Age_Group',value = 'Cases',-Date1)
cases$Age_Group <- factor(cases$Age_Group, 
                          levels = c('New_Aged1','New_Aged1to4','New_Aged5to14','New_Aged15to24','New_Aged25to34','New_Aged35to44','New_Aged45to54','New_Aged55to64','New_Aged65up'))
levels(cases$Age_Group) <- c('Aged 1','Aged 1 to 4','Aged 5 to 14','Aged 15 to 24','Aged 25 to 34','Aged 35 to 44','Aged 45 to 54','Aged 55 to 64','Aged 65+')


ggplot(cases,aes(x=Date1,y=Cases,group=1)) + geom_line(aes(colour= Age_Group)) + ylab('Count of Covid Cases') +
 ggtitle('Daily Cases According to Age Group') + xlab('Date') + theme(plot.title = element_text(hjust = 0.5,face='bold')) + 
scale_color_discrete(name='Age Group', labels=c('Aged 1','Aged 1 to 4','Aged 5 to 14','Aged 15 to 24','Aged 25 to 34','Aged 35 to 44','Aged 45 to 54','Aged 55 to 64','Aged 65+'))

A <- cases %>% hchart('line',hcaes(x=Date1, y=Cases, group=Age_Group))
A <- A %>% 
  hc_title(text = "Interactive Plot of Daily Cases According to Age Group",
           style = list(fontWeight = "bold"),
           align = "center") %>%
  hc_xAxis(title = list(text = "Date"),
           style = list(fontWeight = "bold")) %>%
  hc_yAxis(title = list(text = "Count of Covid Cases"),
           style = list(fontWeight = "bold")) %>%
  hc_legend(title = list(text= 'Age Group'),layout = "vertical", verticalAlign = 'middle',
            align = "right") %>%
  hc_tooltip(crosshairs = TRUE,
             shared = TRUE, 
             borderWidth = 4) %>% 
  hc_exporting(enabled = TRUE)
A
A1 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged1to4, type='scatter', mode = 'lines',name='Aged 1 to 4')
A1 <- A1 %>% layout(yaxis = list(title = '',range=c(0,400)))
A2 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged5to14, type='scatter', mode = 'lines',name='Aged 5 to 14')
A2 <- A2 %>% layout(yaxis = list(title = '',range=c(0,400)))
A3 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged15to24, type='scatter', mode = 'lines',name='Aged 15 to 24')
A3 <- A3 %>% layout(yaxis = list(title = '',range=c(0,400)))
A4 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged25to34, type='scatter', mode = 'lines',name='Aged 25 to 34')
A4 <- A4 %>% layout(yaxis = list(title = '',range=c(0,400)))
A5 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged35to44, type='scatter', mode = 'lines',name='Aged 35 to 44')
A5 <- A5 %>% layout(yaxis = list(title = '',range=c(0,400)))
A6 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged45to54, type='scatter', mode = 'lines',name='Aged 45 to 54')
A6 <- A6 %>% layout(yaxis = list(title = '',range=c(0,400)))
A7 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged55to64, type='scatter', mode = 'lines',name='Aged 55 to 64')
A7 <- A7 %>% layout(yaxis = list(title = '',range=c(0,400)))
A8 <- plot_ly(data=covid_data,x= ~Date1,y= ~New_Aged65up, type='scatter', mode = 'lines',name='Aged 65+')
A8 <- A8 %>% layout(yaxis = list(title = '',range=c(0,400)))
A9 <- subplot(A1,A2,A3,A4,A5,A6,A7,A8,shareY = T,nrows=4)
A9 <- A9 %>% layout(title= '<b> Interactive Plot of Daily Cases According to Age Group ',
                    legend=list(title=list(text='<b> Age Group </b>')))

A9

The heat map below further illustrates the distribution of new cases across the different age groups, where the darkest areas are observed in the April to May region for the Aged 65+ group, and in the October to November regions for the Aged 15-24 group.

i <- ggplot(data=cases, aes(x=Date1, y=Age_Group, fill=Cases)) + geom_tile() + scale_fill_viridis(discrete=FALSE) +  ggtitle("Heat Map: Age Group vs Date") +
  theme(plot.title = element_text(hjust = 0.5,face='bold')) +
  xlab("Date")+
  ylab("Age Group")
ggplotly(i)

Monthly Covid Cases According to Age

In order to get a clearer picture of the overall distribution of covid-19 across the different age groups, we decided to produce a plot of the monthly cases. The highest monthly peak of new covid cases was 6351 and was recorded during the month of October for the Aged 15-24 group. The month of October appears to have the largest number of overall cases, followed then by April.

Parsed_data_age <- covid_data
Parsed_data_age$month<-strftime(as.POSIXct(covid_data$Date1), "%m")
Parsed_data_age$year=strftime(as.POSIXct(covid_data$Date1), "%Y")
bymonth <- aggregate(cbind(New_Aged1,New_Aged1to4,New_Aged5to14,New_Aged15to24,New_Aged25to34,New_Aged35to44,New_Aged45to54,New_Aged55to64,New_Aged65up) ~ month + year, Parsed_data_age, FUN = sum)
bymonth$Date<-as.POSIXct(paste(bymonth$year, bymonth$month, "01", sep = "-"),'GMT')

bymonth<-bymonth%>%select(Date,New_Aged1,New_Aged1to4,New_Aged5to14,New_Aged15to24,New_Aged25to34,New_Aged35to44,New_Aged45to54,New_Aged55to64,New_Aged65up)%>%gather(key='Age_Group',value = 'Cases',-Date)
bymonth$Date=as.Date(bymonth$Date)
bymonth$order <- c(1:length(bymonth$Cases))

bymonth$Age_Group <- factor(bymonth$Age_Group, 
                          levels = c('New_Aged1','New_Aged1to4','New_Aged5to14','New_Aged15to24','New_Aged25to34','New_Aged35to44','New_Aged45to54','New_Aged55to64','New_Aged65up'))

levels(bymonth$Age_Group) <- c('Aged 1','Aged 1 to 4','Aged 5 to 14','Aged 15 to 24','Aged 25 to 34','Aged 35 to 44','Aged 45 to 54','Aged 55 to 64','Aged 65+')




AM <- bymonth %>% 
  hchart('column',hcaes(x='Date',y='Cases',group='Age_Group'),pointwidth=3)
AM <- AM %>% hc_title(text = "Interactive Plot of Daily Cases According to Age Group",
                      style = list(fontWeight = "bold"),
                      align = "center") %>%
  hc_xAxis(title = list(text = "Date"),
           style = list(fontWeight = "bold")) %>%
  hc_yAxis(title = list(text = "Count of Covid Cases"),
           style = list(fontWeight = "bold")) %>%
  hc_legend(title = list(text= 'Age Group'),layout = "vertical", verticalAlign = 'middle',
            align = "right")
AM

Distribution of Covid Cases Across Ages

The variation in the number new cases is approximately similar for the 15-24, 25-34, 35-44, 45-54, 55-64 and 65+ age groups, which are all skewed left. A large number of outliers can be observed in the 15-24 and 65+ groups.

V <- plot_ly(data=covid_data,y= ~New_Aged1, type= 'box',name='Aged 1')
V <- V %>% add_trace(y= ~New_Aged1to4,name='Aged 1 to 4')
V <- V %>% add_trace(y= ~New_Aged5to14,name='Aged 5 to 14')
V <- V %>% add_trace(y= ~New_Aged15to24,name='Aged 15 to 24')
V <- V %>% add_trace(y= ~New_Aged25to34,name='Aged 25 to 34')
V <- V %>% add_trace(y= ~New_Aged35to44,name='Aged 35 to 44')
V <- V %>% add_trace(y= ~New_Aged45to54,name='Aged 45 to 54')
V <- V %>% add_trace(y= ~New_Aged55to64,name='Aged 55 to 64')
V <- V %>% add_trace(y= ~New_Aged65up,name='Aged 65+')
V <- V %>% layout(title = '<b> Variation in Cases by Age Group </b>',
                    xaxis = list(title = '<b> Age Group'),
                    yaxis = list(title = '<b> Count of Covid Cases </b>'),
                    showlegend = F)
V

Covid Cases According to Gender

This part of the analysis deals with the distribution of new covid-19 cases according to gender.

Daily Covid Cases According to Gender

The three sharp peaks observed in the plots below are due to errors in the dataset and will be omitted from the discussion. Overall, the number of new daily cases is approximately similar for males and females. The highest peak of new cases was 638 females and was recorded on the 14th of April.

G <- plot_ly(data=covid_data,x= ~Date1,y= ~Female, type='scatter', mode = 'lines',name='Female')
G <- G %>% add_trace(y= ~Male, mode='lines',name='Male')
G <- G %>% add_trace(y= ~Unknown, mode='lines',name='Unknown')
G <- G %>% layout(title='<b> Daily Cases According to Gender </b>',
                  xaxis= list(title = '<b> Date </b>'),
                  yaxis= list(title ='<b> Count of Covid Cases <b>'),
                  legend=list(title=list(text='<b> Gender </b>')))
G
G1 <- plot_ly(data=covid_data,x= ~Date1,y= ~Female, type='scatter', mode = 'lines',name='Female')
G1 <- G1 %>% layout(yaxis = list(title = '<b> Count of Covid Cases </b>')) 
G2 <- plot_ly(data=covid_data,x= ~Date1,y= ~Male, type='scatter', mode = 'lines',name='Male')
G3 <- plot_ly(data=covid_data,x= ~Date1,y= ~Unknown, type='scatter', mode = 'lines',name='Unknown')
G4 <- subplot(G1,G2,G3,shareY = T)
G4 <- G4 %>% layout(title= '<b> Interactive Plot of Daily Covid Cases According to Gender',
                    legend=list(title=list(text='<b> Gender </b>')))
G4

Monthly Covid Cases According to Gender

Excluding the errors identified above, we can see that the largest number of cases for both males and females was recorded during the month of October. The average number of cases recorded each month for males and females is approximately similar.

Parsed_data_gender <- covid_data
Parsed_data_gender$month<-strftime(as.POSIXct(covid_data$Date1), "%m")
Parsed_data_gender$year=strftime(as.POSIXct(covid_data$Date1), "%Y")

bymonth1 <- aggregate(cbind(Male,Female,Unknown) ~ month + year, Parsed_data_gender, FUN = sum)
bymonth1$start_date<-as.POSIXct(paste(bymonth1$year, bymonth1$month, "01", sep = "-"),'GMT')

bymonth1<-bymonth1%>%select(start_date,Male,Female,Unknown)%>%gather(key='type',value = 'value',-start_date)
bymonth1$start_date=as.Date(bymonth1$start_date)
bymonth1$order <- c(1:length(bymonth1$value))


GB <- bymonth1 %>% 
  hchart('column',hcaes(x='start_date',y='value',group='type'),pointwidth=3)
GB <- GB %>% hc_title(text = "Interactive Plot of Monthly Cases According to Gender",
                      style = list(fontWeight = "bold"),
                      align = "center") %>%
  hc_xAxis(title = list(text = "Date"),
           style = list(fontWeight = "bold")) %>%
  hc_yAxis(title = list(text = "Count of Covid Cases"),
           style = list(fontWeight = "bold")) %>%
  hc_legend(title = list(text= 'Gender'),layout = "vertical", verticalAlign = 'middle',
            align = "right") %>%
  hc_colors(c('#FC022C','#6A6A6A','#036D1E'))
GB

Hospitalisations - Rachel

This section looks at the hospital and ICU admittances that occured to date in Ireland from the 16th March. Analysis was completed using plots and graphs that demonstrate the trends of hospitalisations based on date and age

Overview

Table_1 provides an outline to the variations within the data.

Table_1 <- basicStats(covid_data %>% select(Hosp_Age5:Hosp_Age65up))[c("Mean", "Stdev", "Median", "Minimum", "Maximum", "nobs"),]
kbl(Table_1) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Hosp_Age5 Hosp_Age5to14 Hosp_Age15to24 Hosp_Age25to34 Hosp_Age35to44 Hosp_Age45to54 Hosp_Age55to64 Hosp_Age65up
Mean 0.263158 0.244361 0.721805 1.345865 1.624060 2.560150 2.906015 12.17293
Stdev 0.581407 0.573360 0.997002 1.895556 2.288945 3.727277 4.035994 25.14907
Median 0.000000 0.000000 0.000000 1.000000 1.000000 1.000000 1.000000 5.00000
Minimum 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
Maximum 4.000000 3.000000 5.000000 12.000000 11.000000 23.000000 23.000000 354.00000
nobs 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.000000 266.00000

Cases Organised by Age

Overview on Age related data

Hospital <- covid_data %>%
            select(Date1,Hosp_Age5: Hosp_Age65up) %>%
            gather(key='Hosp_Age',value = 'Cases',-Date1)
            Hospital$Hosp_Age <- factor(Hospital$Hosp_Age, 
            levels = c("Hosp_Age5", "Hosp_Age5to14", "Hosp_Age15to24", "Hosp_Age25to34", "Hosp_Age35to44", "Hosp_Age45to54", "Hosp_Age55to64", "Hosp_Age65up"))
            
#Age-related data: all new cases throughout time points
P4 <- ggplot(Hospital, mapping = aes(x=Date1, y=Cases))+
      geom_bar(position= 'dodge', stat = 'identity')+
      xlab("Month")+
      ylab("Cases")+
      theme_ipsum("Rockwell")

##Scatter plot on all age related new data
P5 <- ggplot(Hospital, mapping = aes(x=Date1, y=Cases, colour= Hosp_Age))+
      geom_point(stat='identity', size= 0.9)+
      theme_ipsum("Rockwell")+
      labs(colour = "Age Groups", size= 1)+
      theme(legend.position = "top")+
      xlab("Month")+
      ylab("Cases")+
      scale_color_discrete(labels = c("Hosp_Age5", "Hosp_Age5to14", "Hosp_Age15to24", "Hosp_Age25to34", "Hosp_Age35to44", "Hosp_Age45to54", "Hosp_Age55to64", "Hosp_Age65up"))


H2<- ggarrange(P4, P5, heights = c(13, 13, 3), align = "v",
          common.legend = TRUE, legend = "bottom",
          labels = c("A", "B"),
          ncol = 1, nrow = 2)+
          ggtitle("Cases Organised by Age")
H2



The above graphs plot all the hospital admittance from the 16th March to date. A) There is a spike in hospital admittances in April. B) The peak is caused by a sharp increase in Hospital cases within the age group 65 and up.

P6 <- Hospital %>% hchart('line',hcaes(x=Date1, y=Cases, group=Hosp_Age))
P6 <- P6 %>% 
  hc_title(text = "Cases arranged by Age Group",
           style = list(fontWeight = "bold"),
           align = "center") %>%
  hc_xAxis(title = list(text = "Date"),
           style = list(fontWeight = "bold")) %>%
  hc_yAxis(title = list(text = "Count of Covid Cases"),
           style = list(fontWeight = "bold")) %>%
  hc_legend(title = list(text= 'Age Group'),layout = "vertical", verticalAlign = 'middle',
            align = "right") %>%
  hc_tooltip(crosshairs = TRUE,
             shared = TRUE, 
             borderWidth = 4) %>% 
  hc_exporting(enabled = TRUE)
P6



P6 shows all cases arranged by Age, it is clear from the above graph that the age group 65 and up contains the highest number of cases in Ireland.

P7 <- plot_ly(data=covid_data, y=~Hosp_Age5, type = 'box')
P7 <- P7 %>% add_trace(y= ~Hosp_Age5,name='Aged 5')
P7 <- P7 %>% add_trace(y= ~Hosp_Age5to14,name='Aged 5 to 14')
P7 <- P7 %>% add_trace(y= ~Hosp_Age15to24,name='Aged 15 to 24')
P7 <- P7 %>% add_trace(y= ~Hosp_Age25to34,name='Aged 25 to 34')
P7 <- P7 %>% add_trace(y= ~Hosp_Age35to44,name='Aged 35 to 44')
P7 <- P7 %>% add_trace(y= ~Hosp_Age45to54,name='Aged 45 to 54')
P7 <- P7 %>% add_trace(y= ~Hosp_Age45to54,name='Aged 55 to 64')
P7 <- P7 %>% add_trace(y= ~Hosp_Age65up,name='Aged 65+')
P7 <- P7 %>% layout(title = '<b> Variation in Cases by Age Group </b>',
                 xaxis = list(title = '<b> Age Group'),
                 yaxis = list(title = '<b> Count of Covid Cases </b>', range=c(0,75)),
                 showlegend = F)
P7



P7 is an interactive visual boxplot that outlines all cases arranged by age from the 16th March.

covid_data1 <-  data.frame(Date = covid_data$Date1,covid_data %>% select(c(New_Aged5to14:New_Aged65up,Hosp_Age5:Hosp_Age65up)))

covid_data1 <- covid_data1 %>%
  rename(
    'Aged 5 to 14' = New_Aged5to14
  )
covid_data1 <- covid_data1 %>%
  rename(
    'Aged 15 to 24' = New_Aged15to24
  )
covid_data1 <- covid_data1 %>%
  rename(
    'Aged 25 to 34 ' = New_Aged25to34
  )
covid_data1 <- covid_data1 %>%
  rename(
    'Aged 35 to 44' = New_Aged35to44
  )
covid_data1 <- covid_data1 %>%
  rename(
    'Aged 45 to 54' = New_Aged45to54
  )
covid_data1 <- covid_data1 %>%
  rename(
    'Aged 55 to 64' = New_Aged55to64
  )
covid_data1 <- covid_data1 %>%
  rename(
    'Aged 65+' = New_Aged65up
  )

s <- apply(covid_data1[,2:16],2,sum)
Tot_cases <- s[1:7]
Hosp_cases <- s[9:15]
Non_Hosp_Cases <- Tot_cases - Hosp_cases
s1 <- cbind(Non_Hosp_Cases,Hosp_cases)
s1 <- prop.table(s1,1)
s2 <- data.frame(s1)
names <- c('Aged 5 to 14','Aged 15 to 24','Aged 25 to 34','Aged 35 to 44','Aged 45 to 54','Aged 55 to 64','Aged 65+')
s2$Group <- names
s2 <- s2 %>%
  rename(
    Hospitalised = Hosp_cases
  )

s2 <- s2 %>%
  rename(
    'Not Hospitalised' = Non_Hosp_Cases
  )
s2 <- s2 %>%
  select(Group,'Not Hospitalised',Hospitalised) %>%
  gather(key='Status',value='Percentage',-Group)
s2$Percentage1 <- 100*s2$Percentage 
s2$Percentage1 <- round(s2$Percentage1, digits = 0)

S <- s2 %>% hchart('column',hcaes(x='Group',y='Percentage1',group='Status'),pointwidth=3) 
S <- S %>% hc_title(text = "Hospitalisation Percentage According to Age Group",
                    style = list(fontWeight = "bold"),
                    align = "center") %>%
  hc_xAxis(title = list(text = 'Age Group'),
           style = list(fontWeight = "bold")) %>%
  hc_yAxis(title = list(text = "Percentage (%)",max =100),
           style = list(fontWeight = "bold")) %>%
  hc_legend(title = list(text= 'Status'),layout = "vertical", verticalAlign = 'middle',
            align = "right")

S

The above plot shows the percentage of cases that required hospitalisation according to age group. As age increases, it can be observed that the percentage of cases requiring hospitalisation increases significantly from a 1% hospitalisation rate for those aged 15 to 24, to a 28% hospitalisation rate for those aged 65+.

Cases Organised by Month

This section looks at the trends in the data arranged by months of the year.

Cases organised within months

Month_data<- select(covid_data, c(Date1, Hosp_Age5:Hosp_Age65up))

Month_data$Date1 <- as.Date(Month_data$Date1)
Months<- months(Month_data$Date1)
Month_data$months <- Months
Month_data2 <- select(Month_data, months, Hosp_Age5:Hosp_Age65up)
Month_data2$months = factor(Month_data2$months, levels = month.name)
Month_data2 <- pivot_longer(Month_data2, -months, names_to = "Hosp_Ages", values_to = "Cases")

Month_data2$Hosp_Ages <- factor(Month_data2$Hosp_Ages, levels = c('Hosp_Age5', 'Hosp_Age5to14', 'Hosp_Age15to24', 'Hosp_Age25to34', 'Hosp_Age35to44', 'Hosp_Age45to54', 'Hosp_Age55to64', 'Hosp_Age65up'))

P8 <- ggplot(Month_data2,aes(x=months,y=Cases,group=1)) + 
                 geom_point(aes(colour= Hosp_Ages), stat='identity') + 
                 ylab('Count of Covid Cases') +
                 ggtitle('Hospital Cases According to Age Group') + 
                 xlab('Months') + 
                 theme(plot.title = element_text(hjust = 0.5,face='bold')) + 
                 scale_color_discrete(name='Hospital Ages', labels=c('Ages 5', 'Aged 5 to 14',
                 'Aged 15 to 24','Aged 25 to 34','Aged 35 to 44','Aged 45 to 54','Aged 55 to 64',
                 'Aged 65+'))+
                 coord_flip()+
                 theme_ipsum("Rockwell")
P8



P8 shows cases per-month colour coded based on age groups within the data. It is clear that the age group 65 up has the highest rate of cases out of all the age groups. The distribution of cases is also linked to the time periods of lockdown in Ireland, with the highest hospital case rates occurring in April.

P9 <- ggplot(Month_data2, aes(x= months,  y= Cases, tooltip=months, fill=months, data_id= months))+
          geom_boxplot_interactive(outlier.colour = "seagreen")+
          guides(fill = "none") + theme_minimal()+
          coord_flip()+
          xlab("Months")
P9



P9 is an interactive boxplot of all the cases occuring within each month since March of 2020. The green dots represent outliers within the dataset.

Monthly data

Average_Month_Data<- Month_data%>%
          group_by(months)%>%
summarise(Hosp_Age5=sum(Hosp_Age5), Hosp_Age5to14=sum(Hosp_Age5to14), Hosp_Age15to24=sum(Hosp_Age15to24), Hosp_Age25to34=sum(Hosp_Age25to34), Hosp_Age35to44=sum(Hosp_Age35to44), Hosp_Age45to54=sum(Hosp_Age45to54),
Hosp_Age55to64=sum(Hosp_Age55to64),  Hosp_Age65up=sum(Hosp_Age65up))
## `summarise()` ungrouping output (override with `.groups` argument)
Month <- c(4,8,12,7,6,3,5,11,10,9)
Average_Month_Data$months <-Month
Month_Average<- Average_Month_Data%>%
                mutate(Month = factor(month.name[Month], levels= month.name))%>%
                arrange(Month)
Monthly_Average <- mutate(Month_Average, Total_sum = Hosp_Age5+ Hosp_Age5to14+ Hosp_Age15to24+ Hosp_Age25to34+ Hosp_Age35to44+ Hosp_Age45to54+ Hosp_Age55to64+ Hosp_Age65up)
Parsed_data_hosp <- covid_data
Parsed_data_hosp$month<-strftime(as.POSIXct(covid_data$Date1), "%m")
Parsed_data_hosp$year=strftime(as.POSIXct(covid_data$Date1), "%Y")
bymonth2 <- aggregate(cbind(Daily_Cases,Daily_Hospital_cases) ~ month + year, Parsed_data_hosp, FUN = sum)
bymonth2$Date<-as.POSIXct(paste(bymonth2$year, bymonth2$month, "01", sep = "-"),'GMT')

t <- bymonth2 %>% group_by(Date) %>% 
  summarise(Total = sum(Daily_Cases), Hospitalised = sum(Daily_Hospital_cases))
## `summarise()` ungrouping output (override with `.groups` argument)
t <- data.frame(t)
t$Not_Hospitalised <- t$Total - t$Hospitalised
t$Hospitalised <- t$Hospitalised/t$Total
t$Not_Hospitalised <- t$Not_Hospitalised/t$Total
t <- t %>%
  rename(
    'Not Hospitalised' = Not_Hospitalised
  )

t1<- t%>%select(Date,Hospitalised,'Not Hospitalised')%>%gather(key='Status',value = 'Percent',-Date)


t1$Percentage <- 100*t1$Percent
t1$Percentage <- round(t1$Percentage, digits = 0)
t1$Date=as.Date(t1$Date)


T <- t1 %>% hchart('column',hcaes(x='Date',y='Percentage',group='Status'),pointwidth=3) 
T <- T %>% hc_title(text = "Hospitalisation Percentage vs Date",
                    style = list(fontWeight = "bold"),
                    align = "center") %>%
  hc_xAxis(title = list(text = 'Date'),
           style = list(fontWeight = "bold")) %>%
  hc_yAxis(title = list(text = "Percentage (%)",max =100),
           style = list(fontWeight = "bold")) %>%
  hc_legend(title = list(text= 'Status'),layout = "vertical", verticalAlign = 'middle',
            align = "right")
T

The percentage of cases requiring hospitalisation was found to be highest in the month of March, which coincides with a large number of cases in those who were aged in the 65+ group.

Treemap on All Cases in Months

Month_Average2 <- select(Monthly_Average, Month, Total_sum)
P10 <- Month_Average2 %>%
  hchart(
    "treemap", 
    hcaes(x = Month, value = Total_sum, color = Total_sum)
    )
P10



P10 is a treemap outlining all the cases that occur within the months of the data. It is very clear that the month of April contains the highest number of hospital cases, and that the summer months of June -August have the lowest number of hospital cases. These dates also correspond to the lockdown periods in Ireland.

Summary data table

Month_Average3 <- select(Month_Average, Month, Hosp_Age5:Hosp_Age65up)
Month_Average_row<- Month_Average3 %>%
                    bind_rows(summarise(.,
                      across(where(is.double), sum),
                      across(where(is.factor), ~"Total")))
Total <- Month_Average_row[11,]
Month_Average <- rbind(Month_Average3, Total)
Table_2 <- mutate(Month_Average, Total_sum = Hosp_Age5+ Hosp_Age5to14+ Hosp_Age15to24+ Hosp_Age25to34+ Hosp_Age35to44+ Hosp_Age45to54+ Hosp_Age55to64+ Hosp_Age65up)

kbl(Table_2) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Month Hosp_Age5 Hosp_Age5to14 Hosp_Age15to24 Hosp_Age25to34 Hosp_Age35to44 Hosp_Age45to54 Hosp_Age55to64 Hosp_Age65up Total_sum
March 5 3 20 61 70 141 122 401 823
April 14 9 29 98 147 221 271 1370 2159
May 3 9 16 30 31 71 73 304 537
June 2 2 4 6 12 7 16 43 92
July 4 2 8 5 9 13 8 25 74
August 4 1 12 15 11 11 10 27 91
September 9 7 18 12 17 34 41 106 244
October 15 20 43 70 62 83 123 397 813
November 12 9 38 50 66 92 90 476 833
December 2 3 4 11 7 8 19 89 143
Total 70 65 192 358 432 681 773 3238 5809



Table 2 contains the total number of hospital cases within each month, categorised by age group.

Ages <- select(Month_Average, Hosp_Age5:Hosp_Age65up)
Ages2 <- Ages[11,]
Title <- c("Ages", "Total")
Ages2 <- cbind(Ages2, Title)
Ages3 <- pivot_longer(Ages2, -Title, names_to = "Ages", values_to = "Total")
Ages4 <- select(Ages3, Ages:Total)
Ages4 <- Ages4[1:9,]

P11 <- Ages4 %>%
  hchart(
    "treemap", 
    hcaes(x = Ages, value = Total, color = Total)
    )
P11



P11 is an interactive treemap highlighting the rate of hospital cases according to age groups within the data. It is clear that the age group 65 up contains the highest hospital rates out of all the age groups within the data.

Transmission & Healthcare Workers - Mathews

Overview of Health Workers Data

Health Workers Data Preview

workersData <- covid_data%>%select(Date1,Daily_Cases,HealthcareWorkersCovidCases,Daily_Hospital_cases)
workersData<-workersData%>%mutate(other=ifelse(Daily_Cases!=0,Daily_Cases-HealthcareWorkersCovidCases,0))
hw <- basicStats(workersData[2:5])[c("Mean", "Stdev", "Median", "Minimum", "Maximum", "nobs"),]
head(workersData)
##        Date1 Daily_Cases HealthcareWorkersCovidCases Daily_Hospital_cases other
## 1 2020-03-18           0                           0                    0     0
## 2 2020-03-19         146                          33                   33   113
## 3 2020-03-20         128                          12                   38   116
## 4 2020-03-21         124                          49                   28    75
## 5 2020-03-22         129                          39                   38    90
## 6 2020-03-23         181                          36                   28   145

Health Workers Summary Table

datatable(hw)


From the summary table we can say that there is an average count of Health Workers Covid Cases 46.

Visualisation of Health Workers Data

Data Preparation Section

total_health<-sum(workersData$HealthcareWorkersCovidCases)
Daily_Cases<-sum(workersData$Daily_Cases)
total_others<-Daily_Cases-total_health



data_pre<- data.frame(type=c('Health Workers','Others'),count=c(total_health,total_others),
                      prop=c((total_health/Daily_Cases)*100,(total_others/Daily_Cases)*100))
data_pre  =data_pre%>%arrange(desc(prop))
data_pre$lab_pos<-cumsum(data_pre$prop)-0.5*data_pre$prop



  parsed_data<-workersData
  parsed_data$month=strftime(as.POSIXct(workersData$Date1), "%m")
  parsed_data$year=strftime(as.POSIXct(workersData$Date1), "%Y")
  data_month<- aggregate(cbind(HealthcareWorkersCovidCases ,Daily_Hospital_cases) ~ month + year, parsed_data, FUN = sum)
  data_month$start_date<-as.POSIXct(paste(data_month$year, data_month$month, "01", sep = "-"),'GMT')
  
  data_month<-data_month%>%select(start_date,HealthcareWorkersCovidCases,Daily_Hospital_cases)%>%gather(key='type',value = 'value',-start_date)
  data_month$start_date=as.Date(data_month$start_date)
  
  data_day<-parsed_data%>%select(Date1,HealthcareWorkersCovidCases ,Daily_Hospital_cases)%>%gather(key='type',value = 'value',-Date1)


Overall Cases Proportion of Health Workers

 ggplot(data_pre, aes(x = 2, y = prop, fill = type)) +
    geom_bar(stat = "identity", color = "white") +
    scale_fill_discrete(name = "Covid Cases")+
    coord_polar(theta = "y", start = 0)+
    geom_text(aes( y = lab_pos,label = round(prop,2)), color = "white") +
    theme_void()+
    labs(title = 'Donut Chart for Overall Cases Proportion')+
    xlim(0.5, 2.5)


From the Donut Chart we can say that 16% of the Total covid Cases reported are Health Workers.In order to analyse the pattern of Health Workers Covid Case Counts by using a bar plot.


Bar Graph Health Workers Covid Cases

plt0<-ggplot(workersData, aes(x = Date1 , y = HealthcareWorkersCovidCases)) +
  geom_bar(fill = "#0073C2FF", stat = "identity") +
  theme_pubclean()+xlab('Month') +ylab('Covid Cases Count')

ggplotly(plt0)


From the bar plot we can say that most of the Health worker Covid Cases were reported in the months of April and May.After September we can see that the Healthworker Covid cases are increasing day by day.To find whether there is any relationship between Hospitalized Covid Patients and healthworkers Lets draw two different Line graphs for both of the data.

Line Graph of Health Workers and Hospitalized Covid Cases

 plt2<-ggplot(data = data_day, aes(x = Date1, y = value )) + 
    geom_line(color = "steelblue")+
    scale_x_date(breaks=data_month$start_date,date_labels = "%b-%y")+xlab('Month')+
    facet_grid(type~.,labeller = label_value)   +ylab('Count of Cases')
 plt2


The line graphs gives us the evidence that health worker covid cases are increased during the same time when there are more number of hospitalized covid cases.In this graph we can see the spikes in the month of October,November and December for both the Healthworker Covid case count and Hospitalized Covid case count.
From June to October the hospitalized Covid Cases are low hence the Healthworker Covid cases are also very low.

By plotting both line graphs will gives us a clearer picture of what is the time gap between peaks of these two data.

Line Graph of Health Workers and Hospitalized Covid Cases in one plot

data_day$type<-factor(data_day$type)
  p<-ggplot(data_day, aes(x = Date1, y = value)) + 
    geom_line(aes(color = type)) + 
    #geom_density(aes(fill=factor(type)), alpha=0.8) +
    scale_colour_discrete("Type of Case")+
    scale_x_date(breaks=data_month$start_date,date_labels = "%b-%y")+xlab('Month')+ylab('Count of Cases')
  
  
  ggplotly(p)


From the Line graph we can see that first peak of both lines occurred in a difference of 15 days.Eventhough both of the line shows a same pattern, we can say that when it comes to pattern health workers data leads by 15 days with respect Hospitalized Covid cases.

Overview of Transmission Data

Data Preview

CCCount is the Close Contact Count
CTCount is the Community Transmission Count
TA is the Travel Abroad Count

trasnsmissionData <- select(covid_data,Date1,Daily_Cases ,CommunityTransmission,CloseContact,TravelAbroad)
countsData<-trasnsmissionData%>%mutate(CTCount=round((CommunityTransmission/100)*Daily_Cases),
                                       CCCount=round((CloseContact/100)*Daily_Cases), TACount=round((TravelAbroad/100)*Daily_Cases))%>%arrange(Date1)
newData<-countsData%>%select(Date1,CTCount,TACount,CCCount)%>%gather(key='type',value = 'value',-Date1)
s <- basicStats(countsData[6:8])[c("Mean", "Stdev", "Median", "Minimum", "Maximum", "nobs"),]
type<-c(rep(newData$type,newData$value))
date<-c(rep(newData$Date1,newData$value))
largeTrasmissionData <- data.frame(date,type)
head(countsData)
##        Date1 Daily_Cases CommunityTransmission CloseContact TravelAbroad
## 1 2020-03-18           0                    40           21           39
## 2 2020-03-19         146                    42           23           35
## 3 2020-03-20         128                    44           23           33
## 4 2020-03-21         124                    43           24           33
## 5 2020-03-22         129                    45           24           31
## 6 2020-03-23         181                    47           24           29
##   CTCount CCCount TACount
## 1       0       0       0
## 2      61      34      51
## 3      56      29      42
## 4      53      30      41
## 5      58      31      40
## 6      85      43      52


Summary Table

datatable(s)


From the table we can say that there is an average count of Community Transmission of 110 and there is also an average count of Close Contact Transmission of 152.
From the table we can say that Compared to other two types of Covid-19 Transmissions,Transmissions because of Travel Abroad is very low(Taking the fact that the maximum count Observed in Travel Abroad Transmission is 75 ).
Lets dive deep into the data by analysing the patterns of these counts.

Visualisation of Transmission Data

Data Preparation Section

parsed_data<-countsData
parsed_data$month=strftime(as.POSIXct(countsData$Date1), "%m")
parsed_data$year=strftime(as.POSIXct(countsData$Date1), "%Y")

#Month wise data
data_month<- aggregate(cbind(CCCount,CTCount,TACount) ~ month + year, parsed_data, FUN = sum)
data_month$start_date<-as.POSIXct(paste(data_month$year, data_month$month, "01", sep = "-"),'GMT')
data_month<-data_month%>%select(start_date,CTCount,TACount,CCCount)%>%gather(key='type',value = 'value',-start_date)
data_month$start_date=as.Date(data_month$start_date)

##Date wise Data
data_day<-parsed_data%>%select(Date1,CTCount,TACount,CCCount)%>%gather(key='type',value = 'value',-Date1)
data_day$type<-factor(data_day$type,labels = c("Close Contact", "Community Transmission", "Travel Abroad") )

 p<-ggplot(data_day, aes(x=type, y=value, fill=type)) +
    geom_boxplot()+ scale_fill_discrete(name = "Transmission Type")+ylab('Count of Cases')+xlab('Transmission Type')
ggplotly(p) 

From the box plot we can see that there are few large value outliers for all three of the transmission type.

Both Close Contact and Community Transmission is right skewed or have positive skewness .Close Contact has more dispersed data when compared to other types.From the box plot its is understood that Difference in medians of Close Contact and Community Transmission is very less.


Overall Cases Proportion based on Transmission Type

By preparing the Donut Chart we will get an idea about what is the overall proportion of each transmission Type.

 col_sum<-colSums(countsData[,6:8])
 pie_data<-data.frame(
   type=c('Community Transmission','Close Contact','Travel Abroad'),
   value=c(col_sum[['CTCount']],col_sum[['CCCount']],col_sum[['TACount']])
 )
 # Calculating the position of labels
 pie_data <- pie_data %>% 
   arrange(desc(type)) %>%
   mutate(prop = value / sum(pie_data$value) *100) %>%
   mutate(ypos = cumsum(prop)- 0.5*prop )

 ggplot(pie_data, aes(x = 2, y = prop, fill = type)) +
   geom_bar(stat = "identity", color = "white") +
   scale_fill_discrete(name = "Covid Cases")+
   coord_polar(theta = "y", start = 0)+
   geom_text(aes( y = ypos,label = round(prop,2)), color = "white") +
   theme_void()+
   labs(title = 'Donut Chart for Overall Cases Proportion')+
   xlim(0.5, 2.5)


From the Donut Chart we can say that most of the Covid Cases transmission type that are reported till now is either from Close Contact Transmission or Community Transmission.Both these transmission types together accounts about 96.0829461% of the total cases.Lets plot the proportion barchart of each day.

Proportion Stacked Barchart based on type of Trasnsmission

From the proportion barchart we can conclude that in the last weeks of March the proportion of all the types of the transmission is almost the same.After that we can see a considerable decrease in the proportion of Transmission due to travel abroad in each day.From the month of June more than 50% of the covid cases reported is due to Close Contact Transmission and about 35% of the cases are due to Community Transmission.

ggplot(data = largeTrasmissionData) +
  geom_bar(mapping = aes(x = date,fill=type ),position = "fill")+ylab('Percentage')+xlab('Month')

Lets plot the grouped bar chart using the Daily Covid Case Count of three transmission types.


Grouped Barchart based on type of Trasnsmission

From the Grouped Barchart we can see a peak of Count of Covid Cases due to Close Contact Transmission in the month of OCtober 2020 in Ireland.There is also another peak in the graph which is the Covid Case Count due to Community Transmission in the month of April.

newData1 <- newData
newData1$type <- factor(newData1$type, 
                          levels = c('CTCount','TACount','CCCount'))
levels(newData1$type) <- c('Community Transmission','Travel Abroad','Close Contact')


M <- newData1 %>% hchart('line',hcaes(x=Date1, y=value, group=type))
M <- M %>% 
  hc_xAxis(title = list(text = "Date"),
           style = list(fontWeight = "bold")) %>%
  hc_yAxis(title = list(text = "Count of Covid Cases"),
           style = list(fontWeight = "bold")) %>%
  hc_legend(title = list(text ='Transmission Type'),
            layout = "vertical", verticalAlign = 'middle',
            align = "right") %>%
  hc_tooltip(crosshairs = TRUE,
             shared = TRUE, 
             borderWidth = 4) %>% 
  hc_exporting(enabled = TRUE)
M

In order to get a clear picture of the trend we can use the Grouped Barchart which shows monthly covid cases count of three Transmission types.


Grouped Bar Chart of Monthly Covid Cases count of three Transmission types

In this Barplot we can see that From March to April there is an increase in the Covid Case Count for all three of Transmission type.Among three Community Transmission marked the highest Count during that period.
In between May and August there was very less Covid Cases Reported,It may be due to the lockdown that is introduced and strictly followed during that time.But From August to October We can see a steady increase in the Count especially the Close Contact Transmission Covid Case count.There is also an increase in the Community Transmission Cases Count during this time but comparatively low when compared to Close Contact.
October Marked the Highest Covid Case Count due to Close Contact and April Marked the Highest Count of Covid Cases for both the Community Transmission and Transmission due to Travel abroad.

ggplot(data_month, aes(fill=type, y=value, x=start_date)) + 
  geom_bar(position="dodge", stat="identity")+ scale_fill_discrete(name = "Transmission Type", labels = c("Close Contact", "Community Transmission", "Travel Abroad"))+
  scale_x_date(breaks=data_month$start_date,date_labels = "%b-%y")+xlab('Month')+ylab('Count of Covid Cases')

Lets analyze the Pattern of each transmission type individually by drawing the line graph for each type.

Line Graph of type of Trasnsmission

The Line graph is giving us the evidence that both Close Contact Transmission and Community transmission patterns look similar which means the count of both the types are increased and decreased at the same time period.
But in the case of Transmission due to travel abroad the cases decreased considerably.There is no cases reported in between mid May to September last week.

 p1<-ggplot(data = data_day, aes(x = Date1, y = value )) + 
  geom_line(color = "steelblue")+scale_x_date(breaks=data_month$start_date,date_labels = "%b-%y")+xlab('')+
  facet_grid(type~.,labeller = label_value)   +ylab('')
 ggplotly(p1)%>% layout(
                  xaxis= list(title = '<b> Month </b>'),
                  yaxis= list(title ='<b> Covid Cases <b>'))

Summary

Section one- Total cases summary

  • The growth factor graphs show the spread of the virus in the population in the long term. It is clear from the trailing growth factor plots that the growth of the virus was beginning to increase in the month of July.

  • The total cases plot follows an exponential trend and this rapid increase in cases is most visible in the months of March and April.

  • The daily cases plot clearly demonstrates the waves in the spread of the virus. The first wave happening in March/April and the second wave occurring in October.

  • The daily death graph indicates most of the deaths occurred during the months of April and May during the first wave. The spikes in October sadly demonstrate more deaths during the second wave of the virus.

Section two- New Cases by Age and Gender

  • The plot of new daily cases has a bimodal distribution.

  • The highest peak of new daily cases recorded was 426, which occurred on the 22nd of April in the Aged 65+ group.

  • The 15-24 and 65+ age groups had the most cases, aged 1 group had the least.

  • The highest monthly peak of new covid cases was 6351 and was recorded during the month of October for the Aged 15-24 group, October had the largest number of overall cases.

  • Overall the number of new daily cases was approximately the same for males and females.

Section three- Hospitalisations

  • Hospitaliastions were compared to the confirmed covid cases. It is clear from the data that hospitalisations remained low despite the high rate of confirmed cases.

  • The age group of 65 up were the group that was most prone to hospitalisations. There was a sharp increase in hospitalisations in this group on the 14th April with 354 hospital cases.

  • Hospital and confirmed covid cases were lowest during the lock downs. ICU cases remain low throughout the months since March.

Section four- Transmission & Healthcare Workers

  • Close Contact and Community Transmission is the most common way of transmission.

  • Highest Peak of transmission occurred during the month of October.

  • The average health workers case is about 24 cases a day.